-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pdf text recognition phase1. basic text import and layout #135
Open
olivetthered
wants to merge
14
commits into
scribusproject:master
Choose a base branch
from
olivetthered:pdfTextRecognition-phase1.-basic-text-import-and-layout
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Pdf text recognition phase1. basic text import and layout #135
olivetthered
wants to merge
14
commits into
scribusproject:master
from
olivetthered:pdfTextRecognition-phase1.-basic-text-import-and-layout
+979
−55
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
imnport text from a pdf document with some fuzzy matching to put lines of text that appear to be;long together in the same textframe. layout is good but there's no font or styling support as of yet and rotated text isn't supported either. creats lots of text boxes if the pdf file reports lots of text regions, they also need joining up in a second pass to merge textregions that should be together regardlesds of what the pdf file is reporting.
UI for selecting text import as either vectors (dewfault) or as text. There will need to be some more variables for text import so the user can configure how loose or strict the text block matching is as I doub't even with good guesses it won't be a one size fits all solution.
Pending file review by ale |
I raised the following bug to have this pull request reviewed and integrated: |
implement text import as a new outputdev inheriting slaOutputdev and making the appropriate private members of slaOutptutDev protected
tidy up so we make minimul changes from master
fixed some space differences with master
override type3 font output as we don't want to get confused and try to render them as vectors when vector rendering is only partially functional due to overrides from slaoutputdev. Hopefully they can be implemneted in the same way as addChar but if that turns out to be infeasable the overrtides can be removed and they can get rendered as vectors in the finished implementation.
…taken change the name of TextOutputDev to PdfTextOutputDev as it's already taken the PdfTextOutputDev naming matches tjhe naming of PdfTextRecognition
…varialbes to make the classes and memb ers iuniform accrtoss the pdfTextRecognition implementation remane all the classes and member variables and function so they start with pdf ext unless it's not appropriate.
moved the optpuit dev into the pdftextrecognition files meaning slaoutput dev files longer have any dependencies on pdftextrecognition. This now keeps things neet and tody and a;l together.
sync with upstream master
fix z-order/grouping. I don't know why I did this in the first place
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basic implementation of importing text from pdf files.
No fonts or styling yet, or second passes over grouping text areas together etc... just some basic text area grouping and layout but enough for additional features to be implemented fairly independently of each other. Currently, only support single page, select import text as text option (as opposed to the default import text as vector) in the GUI when importing a vector file of pdf type to import the text from a pdf file as text.